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ABSTRACT . 

Standard errors of pooled mean estimate in multiple 
matxix sampling were compared for two procedures. The data were from 
tests involving items with and without . replacement. The two 
procedures involve the formulations of Madow and Lord r and Hovick; 
the former permits, sampling of item,., with or without replacement, 
whereas the latter is to be used for item sampling without 
replacement.' The results show that the two estimates give^ 
considerably diffjering error estimates of the pooled mean. 
(Author) 
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Estimating the Standard Error of the Mean in Multiple Matrix Sampling 
When Items are Sampled With and Without Replacement 

Tej N« Pandey 
California State Department of Education 

INTRODUCTION 

Multiple matrix sampling is a procedure in which a universe of 
test Items is subdivided into more than one test form with each form 
administered to a certain number of examinees. Although each examinee 
is administered only a portion of the test items in the total pool, the 
results from each form administered may be used to estimate the param- 
eters of the matrix universe and associated standard errors. Several 
states, for example, California, C|regon, and New Mexico are using multiple 
matrix sampling procedure advantageously for their statewide assessment 
programs— providing group information at relatively lesser cost and 
testing time as compared to the traditional testing procedures. 

A review of the matrix sampling literature dealing with the estimation 
of the mean and associated standard error indicates that the major emphasis 
in the matrix sampling item allocation designs is towards those which 
allocate an equal number of item^ to each form and items are sampled without 
replacement. The equations for estimating the standard error ^f the pooled 
mean under these assumptions have been given by Lord and Novick (I968) for 
dichotomus item scores and by Pandey and Shoemaker (1975) for polychotomus 
item scores. The available computer programs utilize either one or the other 
of these eqtiations to compute the standard error of the pooled mean. 
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To date, not much attention has been given towards estimation of the 
standard error of the pooled mean involving item sampling designs allocating 
items^both with and without replacement and possibly unequal number of 
items in the forms. The necessity for such applications would be common in 
reading tests for lower grades, requiring simultaneous oral administration 
for a part of each test form and the remaining items may be unique item samples 
from the item pool., Furthennore, it is not uncommon in such area as reading 
comprehension where items are related to a passage^ an equal number of items 
to each form may not always be possible. 

This paper presents relevant equation from Madow^ (1972) for estimating 
the standard error of the pooled mean in matrix sampling. The equation was 
derived for cases of stratified sampl^es of persons and items, with possibly 
unequal sizes of samples, and with possible overlap of samples. Madow*s 
derivations utilize conditional variance theorem rather than polykays and 
bipolykays used oy Lord and Novick. This paper presents the equation 
modified so as to be applicable for cases of items sampled both with and 
without replacement. AlsOt a more general equation for the estimation of 
standard error of the mean has been derived starting with Lord and Novick's 
formulations, which is shown to be equivalent to the Madow's equation for 
the special case of sampling of items without replacement. 

For a typical data set, this paper compares the standard error of the 
pooled mean as computed using Madow's equation and those computed from 
Lord and Novick*s equation using certain approximations to satisfy the 
assumptions underlying the equation. 
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NOTATIONS 

Let us suppose a population U of N persons and a universe V of 

K items. It is proposed to estimate the average score, X, that would be 

obtained if all N persons in U took all K items in V, an^ the standard ^ 

error of the estimated mean. Furthermore, suppose tlie population U has 

been stratified into G strata U. • U^, ...»U^, where U consists of Ntr 

1' 2' G' g • 

person^, g=l, 2,..., G, andJNg^N. ^Also/ that the universe V has been 
stratified into D strata V^, V^, Vj^ where consists of items, 

d=:l, 2,..., D, and£K^=:K. 

/Suppose that T samples, u are selected by simple* random sampling 

tg 

from U , and T samples v. , are selected by, simple random sampling from 
S * td 

V^: the number of elements in u. is n. and the number of elements in 
d' . tg tg ' 

V. , ds k. , t=l, 2, T. Denote ur'^Efie^ sample consisting of the elements 

of u^^, ^^2' ^tG ^t sample consisting^ of the elements of 

th 

respectively. The pair ( u^, v^ ) is defined as the t stratified matrix 
sample. 

COMPUTATIONAL FORMULAS 

Following Madow (1972), the computational formula for the unbiased estimate 
of the mean through multiple matrix sampling is: 



where c^, c^i are constants, and 



'^t - t ^ N K ^tgd 

and X ~ n ic ?^^tediii 

tgd tg td ^ J *^ 
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Assuming no strata in the sampling of persons, but two strata for the sampling 
of itemo— one strata for the sampling of items with replacement, the other for 
the sampling of items, without replacement—the standard error for the estimate 
of the mean in multiple matrix sampling is given by: 

2 2 2 

where a— , a— i and a-- are the population variances associated 

1. J . 

due to person effect, item effect, and person x item effect in a linear 
model of test scores. The variance representing the' standard error is 
a composite of the three variances — due to sampling of persons, sampling 
of items and an interaction term. Also, each of the latter two terms 
are shown to be composite of two terms — due to sampling of items with 
replacement and sampling of items without replacement. If the sampling 
of items is without replacement only, the above equation can be written as: 

It is easy to note that for finite N, the first term representing the variance 
due to sampling of persons vanishes if the number of persons taking each form 
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Var (x) 



is equal. Similarly, for finite Kx, the second term representing the variance 
due'/to sampling of^items^vaMshBs^ ir^ number of items assigned to each form 
is equal. Therefore, when N=T n^^ and ..K=T k^^ for t=:l, 2,..., T; the \ 
computational formula f<Sr the standard error of the mean is given by:^ 

. Var ( X ) = JL [ T - 

NK ^ J X.. ' 

The foregoing equations indicate that the standard error of the mean can be ^ 
reduced by assigning equal number of items to forms and administering forfflfi 
in a manner so that equal number of persons take each form. 

Lord and Novick (I968) present the formulas for the standard error of 
the mean when the items are sampled without replacement for (a) items sampled 
inexhaustively (equation 11.12.3) and (b) items sampled exhaustively (equation 
11.12.^). These equations are given for binary item scoring. It is shown 
that how relatively more general equation can be arrived at from Lord and 
Novick •s equations (11*11.6 ) and (11.12.2). [ Notations have been changed 
here for the sake of consistency.] 

By the formula for the variance of a sum, 

Var (x) = ^ [l Var (x^) + ^^t * ^t»^] 

using equation (11.11.6), it can be shown that 

I 

Also using equation (11.12.2), it can be shown that 



ERIC 



7 



Combining (11.11.6) and (11.12.2), we get 




which is the same as the equation derived by Madow. In the opinion of the 
author, a computer progi?am using the above equation will be more useful than 
programming for equatiorte (11.12.3) and (11.12.^). The later are the special 
cases of the above equation. 

STANDARD ERROR APPROXIMATION FROM LORD AND NOVICK«S EQUATION 

The iihderlying assumptions of equation (11.12.^) for estimating the standard^ 
error of the mean are that items are sampled exhaustively and without replace- 
ment. However, for design^ involving sampling of items both with and without 
replacement approximate results can be obtained by inflating^the size of the 
item universe, as if ,each of the item sampled with replacement is a unique 
item. For example, a multiple matrix sampling design involving T forms, in 
which items are with replacement and items are without replacement; the 
inflated finite item universe is 

However, it is to be noted that the unique item universe is only k^ + ^2*^* 

DATA 

The data was collected as part of the California Assessment Program 
involving the Reading Test for grades 2 and 3* The assessment item pool 
consisted of 212 items in a multiple .choice format. The total number of 



items were divided into ten nearly "-parallel jforms. When assigning items to 
foiHiS, 12 iletss, involving oral presentation of the stimuli, were repeated, 
across all ten forms. The remaining 20 items in each form were unique items. 
The test was administered to ^57 second and third grade pupils in a, typical 
California school district according to standardized testing procedures 
described in the manual. The standard error of the pooled mean were computed 
using the exact formula as well as approximations to Lord and Novick's formula. 
For apprpxiinaie results, for finite item universe, the item universe was taken 
as 520 instead of 212. The results were computed for finite and infinite item 
universe as well as finite and infinite population. The results are given in 
Table 1. ^ _ 

, RESULTS AND IMPLICATIONS 

r 

The purpose of two methods of computing the standard error is not to show' 
if there are any differences in ,the two estimat-es, rather to show how trivial 
or large are the differences for a typical data «et. The results of this 1 
investigation show that for data collected using the specified item sampling 
design, the estimates of the standard error of the mean as computed using 
approximations from Madow*s formulations differ considerably from those com- 
puted from approximations of Lord and Novick's equations^ If the total error 
in computing the pooled means is a compoi>ite of contributions due to sampling 
of persons, sampling of items, and an interaction term, the major differences 
appear in the term representing the error due to the sampling of items. This 
term is considerably overestimated from approximations of Lord and Novick^s 
equation. 
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It shibuld be emphasized that Lord and Novick's equation is recommen.ded 
for use when item samplijig designs are based upon sampling of items without 
leplacement. The virtue of using this equation for approxir.iations lies only 
in its computational simplicity. Based on the findings from this particular 
item sampling design, however, it is recommended that exact computational 
procedures be used when item sampling designs involve sampling of itemo both 
with and without replacement. 
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Table 1 

Comparison of the Standard Errors of the Pooled Mean 



Sampling 



Exact Equation 



Approximation to 
Lord 8c Novick^s Equation 



^ Grade 2 

/ N finite, K finite 

/ N finite, K infinite 

/ N infinite, K finite 

/ N infinite, K infinite 



0.00^ 
0.01^ 
0.015 
0.020 



0.026 
0.028 
0.030 
0.031/ 



Grade 3 ^ 

N finite, K finite 
N finite, K infinite 
N infijciite, K finite 
N inf ignite, K infinite 

\ 

Grade 2 t ,Grade 3 

N finite, K finite 
N finite, K infinite 
N infinite, K finite 
N infinite, K infinite 



0.003 , 

0.011 

0.012' 

0.016 



0.002 
0.010 
0.010 
O.Olif 



0.020 
0.021 
0.023 
0.024 



0.021 
0.022 
0.023 
0.02^^ 



Table 2 

Number of Pupils Taking Each Form 



FORM 1 FORM 2 FORM 3 FORM if FORM 5 FORM 6 FORM 7 FORM 8 FORM 9 FORM 10 TOTAL 



Grade 2 



2k 21 2k 22 20 22 v 22 21 



\ 



\ 



Grade 3 25 23 27 22 27 > 23 22 2k 



21 



25 



/ 



19 216 



23 ' 2^*1 



Grade 2 



k9 kk 51 kk k7 k3 kk k3 kG k2 if57 



Grade 3 
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